Skip to content

docs: enrich module docstrings and add doctest examples#1498

Open
timsaucer wants to merge 1 commit intoapache:mainfrom
timsaucer:feat/module-docstrings
Open

docs: enrich module docstrings and add doctest examples#1498
timsaucer wants to merge 1 commit intoapache:mainfrom
timsaucer:feat/module-docstrings

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Part of #1394. This is "PR 1b" from the implementation plan in
#1394 (comment).

Rationale for this change

The per-module docstrings for functions.py, dataframe.py, expr.py,
and context.py were one-line summaries that pointed at the online
docs without explaining the module's role or giving any example. That
makes the repo harder to navigate both for humans skimming the source
and for AI coding assistants that can only see what ships with the
package. Several of the most commonly used DataFrame methods also
lacked runnable examples, even though peer methods (intersect,
except_all, distinct_on, union_by_name, join_on, ...) had
already been brought up to the project's example-in-docstring
convention.

What changes are included in this PR?

  • Enriched module docstrings for functions.py, dataframe.py,
    expr.py, and context.py. Each now opens with a one-line summary
    of the type's role, a paragraph of concept/usage guidance with
    :py:class: / :py:meth: cross-references, a compact doctest, and
    a :ref: pointer into the docs site.
  • Added doctest examples to six high-traffic DataFrame methods:
    select, aggregate, sort, limit, join, and union.
    Optional parameters are passed with keyword syntax, and examples
    reuse the same input data across variants so the effect of each
    option is easy to see.
  • pytest --doctest-modules is clean (266 → 276 passing doctests);
    full suite passes locally.

Are there any user-facing changes?

Documentation only — no API changes.

Expands the module docstrings for `functions.py`, `dataframe.py`,
`expr.py`, and `context.py` so each module opens with a concept summary,
cross-references to related APIs, and a small executable example.

Adds doctest examples to the high-traffic `DataFrame` methods that
previously lacked them: `select`, `aggregate`, `sort`, `limit`, `join`,
and `union`. Optional parameters are demonstrated with keyword syntax,
and examples reuse the same input data across variants so the effect of
each option is easy to see.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# under the License.

"""Session Context and it's associated configuration."""
""":py:class:`SessionContext` — entry point for running DataFusion queries.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expect to be changing a bunch of the website stuff it feels like it would be nice to generate a preview in CI if not exceedingly expensive.

>>> df = ctx.from_pydict(
... {"team": ["x", "x", "y"], "score": [1, 2, 3]}
... )
>>> df.aggregate([], [F.sum(col("score")).alias("total")]).to_pydict()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would feel more pythonic to accept None in addition to the empty list if the description is No grouping

>>> df.aggregate(
... [col("team")], [F.sum(col("score")).alias("total")]
... ).sort("team").to_pydict()
{'team': ['x', 'y'], 'total': [3, 3]}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I think it would be nice if the scores weren't the same to avoid any chance of ambiguity.

>>> df = ctx.from_pydict(
            ...     {"team": ["x", "x", "y"], "score": [1, 2, 5]}
            ... )
>>> df.aggregate(
            ...     [col("team")], [F.sum(col("score")).alias("total")]
            ... ).sort("team").to_pydict()
            {'team': ['x', 'y'], 'total': [3, 5]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants